AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.59)

Neural Information Processing SystemsDec-24-2025, 14:13:03 GMT

Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems

In the stochastic contextual low-rank matrix bandit problem, the expected reward of an action is given by the inner product between the action's feature matrix and some fixed, but initially unknown $d_1$ by $d_2$ matrix $\Theta^*$ with rank $r \ll \{d_1, d_2\}$, and an agent sequentially takes actions based on past experience to maximize the cumulative reward. In this paper, we study the generalized low-rank matrix bandit problem, which has been recently proposed in \cite{lu2021low} under the Generalized Linear Model (GLM) framework. To overcome the computational infeasibility and theoretical restrain of existing algorithms on this problem, we first propose the G-ESTT framework that modifies the idea from \cite{jun2019bilinear} by using Stein's method on the subspace estimation and then leverage the estimated subspaces via a regularization idea. Furthermore, we remarkably improve the efficiency of G-ESTT by using a novel exclusion idea on the estimated subspace instead, and propose the G-ESTS framework. We also show that both of our methods are the first algorithm to achieve the optimal $\tilde{O}((d_1+d_2)r\sqrt{T})$ bound of regret presented in \cite{lu2021low} up to logarithm terms under some mild conditions, which improves upon the current regret of $\tilde{O}((d_1+d_2)^{3/2} \sqrt{rT})$~\citep{lu2021low}. For completeness, we conduct experiments to illustrate that our proposed algorithms, especially G-ESTS, are also computationally tractable and consistently outperform other state-of-the-art (generalized) linear matrix bandit methods based on a suite of simulations.

efficient framework, generalized low-rank matrix bandit problem, name change, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.75)

arXiv.org Artificial IntelligenceJun-23-2025

Info-Coevolution: An Efficient Framework for Data Model Coevolution

Qin, Ziheng, Xu, Hailun, Yew, Wei Chee, Jia, Qi, Luo, Yang, Sarkar, Kanchan, Guan, Danhui, Wang, Kai, You, Yang

Machine learning relies heavily on data, yet the continuous growth of real-world data poses challenges for efficient dataset construction and training. A fundamental yet unsolved question is: given our current model and data, does a new data (sample/batch) need annotation/learning? Conventional approaches retain all available data, leading to non-optimal data and training efficiency. Active learning aims to reduce data redundancy by selecting a subset of samples to annotate, while it increases pipeline complexity and introduces bias. In this work, we propose Info-Coevolution, a novel framework that efficiently enables models and data to coevolve through online selective annotation with no bias. Leveraging task-specific models (and open-source models), it selectively annotates and integrates online and web data to improve datasets efficiently. For real-world datasets like ImageNet-1K, Info-Coevolution reduces annotation and training costs by 32\% without performance loss. It is able to automatically give the saving ratio without tuning the ratio. It can further reduce the annotation ratio to 50\% with semi-supervised learning. We also explore retrieval-based dataset enhancement using unlabeled open-source data. Code is available at https://github.com/NUS-HPC-AI-Lab/Info-Coevolution/.

artificial intelligence, deep learning, machine learning, (19 more...)

2506.0807

Country:

North America (0.28)
Asia > Singapore (0.14)

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Neural Information Processing SystemsFeb-7-2025, 10:16:54 GMT

Review for NeurIPS paper: An Efficient Framework for Clustered Federated Learning

Additional Feedback: Empirical Analysis: - The approach is not compared to related work. Straight-forward baselines would be clustering on the central machine approach [9] or the fine-tuning of global models [7, 35] which are cited in the paper. Theoretical Analysis: My main concern with the theoretical analysis is the assumption that initial models are already very close their correct clusters (1/4 of the minimum distance between cluster centers for the linear models - for the strong convex problems an additional factor comes in that depends on the strong convexity and smoothness of the loss). I would argue that if models would be initialized this way, then performing a clustering on the initial models should already give the right clusters. A minor issue is that the convergence rate seems not to address the number of participating workers (line 4 of Algo.

clustered federated learning, efficient framework, neurips paper, (10 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-7-2025, 10:16:46 GMT

Review for NeurIPS paper: An Efficient Framework for Clustered Federated Learning

Reviewers agree that the central idea is simple, which can be seen as a strength, and that the analysis is valuable. The concern about comparison only to baselines and not a more real-world method will be rectified by including the promised comparison to ClusteredFL. Without this comparison at submission, we must assume it will be on par, and therefore the significance of the result is reduced. The statements about reduced computation at the central server can also be accompanied by the statements abour privacy benefits (not sending user data to the server), even given the provisos at line 347.

clustered federated learning, efficient framework, neurips paper, (1 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Neural Information Processing SystemsJan-15-2025, 11:42:12 GMT

Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems

In the stochastic contextual low-rank matrix bandit problem, the expected reward of an action is given by the inner product between the action's feature matrix and some fixed, but initially unknown d_1 by d_2 matrix \Theta * with rank r \ll \{d_1, d_2\}, and an agent sequentially takes actions based on past experience to maximize the cumulative reward. In this paper, we study the generalized low-rank matrix bandit problem, which has been recently proposed in \cite{lu2021low} under the Generalized Linear Model (GLM) framework. To overcome the computational infeasibility and theoretical restrain of existing algorithms on this problem, we first propose the G-ESTT framework that modifies the idea from \cite{jun2019bilinear} by using Stein's method on the subspace estimation and then leverage the estimated subspaces via a regularization idea. Furthermore, we remarkably improve the efficiency of G-ESTT by using a novel exclusion idea on the estimated subspace instead, and propose the G-ESTS framework. We also show that both of our methods are the first algorithm to achieve the optimal \tilde{O}((d_1 d_2)r\sqrt{T}) bound of regret presented in \cite{lu2021low} up to logarithm terms under some mild conditions, which improves upon the current regret of \tilde{O}((d_1 d_2) {3/2} \sqrt{rT}) \citep{lu2021low}.

algorithm, efficient framework, generalized low-rank matrix bandit problem, (2 more...)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.87)
Information Technology > Artificial Intelligence > Machine Learning (0.78)

Ashkboos, Saleh, Verhoef, Bram, Hoefler, Torsten, Eleftheriou, Evangelos, Dazzi, Martino

EfQAT: An Efficient Framework for Quantization-Aware Training

arXiv.org Artificial IntelligenceNov-17-2024

Quantization-aware training (QAT) schemes have been shown to achieve near-full precision accuracy. They accomplish this by training a quantized model for multiple epochs. This is computationally expensive, mainly because of the full precision backward pass. On the other hand, post-training quantization (PTQ) schemes do not involve training and are therefore computationally cheap, but they usually result in a significant accuracy drop. We address these challenges by proposing EfQAT, which generalizes both schemes by optimizing only a subset of the parameters of a quantized model. EfQAT starts by applying a PTQ scheme to a pre-trained model and only updates the most critical network parameters while freezing the rest, accelerating the backward pass. We demonstrate the effectiveness of EfQAT on various CNNs and Transformer-based models using different GPUs. Specifically, we show that EfQAT is significantly more accurate than PTQ with little extra compute. Furthermore, EfQAT can accelerate the QAT backward pass between 1.44-1.64x while retaining most accuracy.

large language model, machine learning, natural language, (18 more...)

2411.11038

Country: Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)

Neural Information Processing SystemsOct-11-2024, 14:54:11 GMT

An Efficient Framework for Clustered Federated Learning

clustered federated learning, efficient framework, federated learning, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.62)

arXiv.org Artificial IntelligenceAug-17-2023

BERT4CTR: An Efficient Framework to Combine Pre-trained Language Model with Non-textual Features for CTR Prediction

Wang, Dong, Salamatian, Kavé, Xia, Yunqing, Deng, Weiwei, Zhiang, Qi

Although deep pre-trained language models have shown promising benefit in a large set of industrial scenarios, including Click-Through-Rate (CTR) prediction, how to integrate pre-trained language models that handle only textual signals into a prediction pipeline with non-textual features is challenging. Up to now two directions have been explored to integrate multi-modal inputs in fine-tuning of pre-trained language models. One consists of fusing the outcome of language models and non-textual features through an aggregation layer, resulting into ensemble framework, where the cross-information between textual and non-textual inputs are only learned in the aggregation layer. The second one consists of splitting non-textual features into fine-grained fragments and transforming the fragments to new tokens combined with textual ones, so that they can be fed directly to transformer layers in language models. However, this approach increases the complexity of the learning and inference because of the numerous additional tokens. To address these limitations, we propose in this work a novel framework BERT4CTR, with the Uni-Attention mechanism that can benefit from the interactions between non-textual and textual features while maintaining low time-costs in training and inference through a dimensionality reduction. Comprehensive experiments on both public and commercial data demonstrate that BERT4CTR can outperform significantly the state-of-the-art frameworks to handle multi-modal inputs and be applicable to CTR prediction.

artificial intelligence, machine learning, natural language, (15 more...)

2308.11527

Country:

North America > United States > District of Columbia > Washington (0.05)
Asia > China > Beijing > Beijing (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceAug-14-2023

GCformer: An Efficient Framework for Accurate and Scalable Long-Term Multivariate Time Series Forecasting

Zhao, YanJun, Ma, Ziqing, Zhou, Tian, Sun, Liang, Ye, Mengni, Qian, Yi

Transformer-based models have emerged as promising tools for time series forecasting. However, these model cannot make accurate prediction for long input time series. On the one hand, they failed to capture global dependencies within time series data. On the other hand, the long input sequence usually leads to large model size and high time complexity. To address these limitations, we present GCformer, which combines a structured global convolutional branch for processing long input sequences with a local Transformer-based branch for capturing short, recent signals. A cohesive framework for a global convolution kernel has been introduced, utilizing three distinct parameterization methods. The selected structured convolutional kernel in the global branch has been specifically crafted with sublinear complexity, thereby allowing for the efficient and effective processing of lengthy and noisy input signals. Empirical studies on six benchmark datasets demonstrate that GCformer outperforms state-of-the-art methods, reducing MSE error in multivariate time series benchmarks by 4.38% and model parameters by 61.92%. In particular, the global convolutional branch can serve as a plug-in block to enhance the performance of other models, with an average improvement of 31.93\%, including various recently published Transformer-based models. Our code is publicly available at https://github.com/zyj-111/GCformer.

data mining, machine learning, natural language, (20 more...)

2306.08325

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California (0.14)
Asia > China > Zhejiang Province > Hangzhou (0.04)
(5 more...)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)